Search CORE

102 research outputs found

The Hitchhiker's Guide to Facebook Web Tracking with Invisible Pixels and Click IDs

Author: Bekos Paschalis
Kourtellis Nicolas
Markatos Evangelos P.
Papadopoulos Panagiotis
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 28/03/2023
Field of study

Over the past years, advertisement companies have used various tracking methods to persistently track users across the web. Such tracking methods usually include first and third-party cookies, cookie synchronization, as well as a variety of fingerprinting mechanisms. Facebook (FB) recently introduced a new tagging mechanism that attaches a one-time tag as a URL parameter (FBCLID) on outgoing links to other websites. Although such a tag does not seem to have enough information to persistently track users, we demonstrate that despite its ephemeral nature, when combined with FB Pixel, it can aid in persistently monitoring user browsing behavior across i) different websites, ii) different actions on each website, iii) time, i.e., both in the past as well as in the future. We refer to this online monitoring of users as FB web tracking. We find that FB Pixel tracks a wide range of user activities on websites with alarming detail, especially on websites classified as sensitive categories under GDPR. Also, we show how the FBCLID tag can be used to match, and thus de-anonymize, activities of online users performed in the distant past (even before those users had a FB account) tracked by FB Pixel. In fact, by combining this tag with cookies that have rolling expiration dates, FB can also keep track of users' browsing activities in the future as well. Our experimental results suggest that 23% of the 10k most popular websites have adopted this technology, and can contribute to this activity tracking on the web. Furthermore, our longitudinal study shows that this type of user activity tracking can go as far back as 2015. Simply said, if a user creates for the first time a FB account today, FB could, under some conditions, match their anonymously collected past web browsing activity to their newly created FB profile, from as far back as 2015 and continue tracking their activity in the future

arXiv.org e-Print Archive

FNDaaS: Content-agnostic Detection of Fake News sites

Author: Kourtellis Nicolas
Markatos Evangelos P.
Papadopoulos Panagiotis
Spithouris Dimitris
Publication venue
Publication date: 13/12/2022
Field of study

Automatic fake news detection is a challenging problem in misinformation spreading, and it has tremendous real-world political and social impacts. Past studies have proposed machine learning-based methods for detecting such fake news, focusing on different properties of the published news articles, such as linguistic characteristics of the actual content, which however have limitations due to the apparent language barriers. Departing from such efforts, we propose FNDaaS, the first automatic, content-agnostic fake news detection method, that considers new and unstudied features such as network and structural characteristics per news website. This method can be enforced as-a-Service, either at the ISP-side for easier scalability and maintenance, or user-side for better end-user privacy. We demonstrate the efficacy of our method using data crawled from existing lists of 637 fake and 1183 real news websites, and by building and testing a proof of concept system that materializes our proposal. Our analysis of data collected from these websites shows that the vast majority of fake news domains are very young and appear to have lower time periods of an IP associated with their domain than real news ones. By conducting various experiments with machine learning classifiers, we demonstrate that FNDaaS can achieve an AUC score of up to 0.967 on past sites, and up to 77-92% accuracy on newly-flagged ones

arXiv.org e-Print Archive

Shadow Honeypots

Author: Akritidis Periklis
Anagnostakis Kostas G.
Keromytis Angelos D.
Markatos Evangelos P.
Polychronakis Michalis
Sidiroglou Stelios
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2009
Field of study

We present Shadow Honeypots, a novel hybrid architecture that combines the best features of honeypots and anomaly detection. At a high level, we use a variety of anomaly detectors to monitor all traffic to a protected network or service. Traffic that is considered anomalous is processed by a "shadow honeypot" to determine the accuracy of the anomaly prediction. The shadow is an instance of the protected software that shares all internal state with a regular ("production") instance of the application, and is instrumented to detect potential attacks. Attacks against the shadow are caught, and any incurred state changes are discarded. Legitimate traffic that was misclassified will be validated by the shadow and will be handled correctly by the system transparently to the end user. The outcome of processing a request by the shadow is used to filter future attack instances and could be used to update the anomaly detector. Our architecture allows system designers to fine-tune systems for performance, since false positives will be filtered by the shadow. We demonstrate the feasibility of our approach in a proof-of-concept implementation of the Shadow Honeypot architecture for the Apache web server and the Mozilla Firefox browser. We show that despite a considerable overhead in the instrumentation of the shadow honeypot (up to 20% for Apache), the overall impact on the system is diminished by the ability to minimize the rate of false-positives

CiteSeerX

Columbia University Academic Commons

Speeding up TCP/IP: Faster Processors are not Enough

Author: Evangelos Markatos Institute
Evangelos P. Markatos
Publication venue: IEEE
Publication date: 01/01/2001
Field of study

Over the last decade we have been witnessing a tremendous increase in the capacities of our computation and communication systems. On the one hand, processor speeds have been increasing exponentially, doubling every 18 months or so, while network bandwidth, has followed a similar (if not higher) rate of improvement, doubling every 9-12 months, or so. Unfortunately, applications that communicate frequently using standard protocols like TCP/IP do not seem to improve at similar rates

CiteSeerX

On caching search engine query results

Author: Evangelos P. Markatos
Publication venue
Publication date: 01/01/2000
Field of study

In this paper we explore the problem of Caching of Search Engine Query Results in order to reduce the computing and I/O requirements needed to support the functionality of a search engine of the World-Wide Web. We study query traces from the EXCITE search engine and show that they have a significant amount of temporal locality: that is, a significant percentage of the queries have been submitted more than once by the same or a different user. Using trace-driven simulation we demonstrate that medium-size caches can hold the results of most of the frequently-submitted queries. Finally, we compare the effectiveness of static and dynamic caching and conclude that although dynamic caching can use large caches more effectively, static caching can perform better for (very) small caches.

CiteSeerX

Speeding up TCP/IP: Faster Processors are not Enough

Author: Evangelos P. Markatos
Publication venue: IEEE
Publication date: 01/01/2002
Field of study

Over the last decade we have been witnessing a significant increase in the capabilities of our computing and communication systems. On the one hand, processor speeds have been increasing exponentially, doubling every 18 months or so, while network bandwidth, has followed a similar (if not higher) rate of improvement, doubling every 9-12 months, or so. Unfortunately, applications that communicate frequently using standard protocols like TCP/IP do not seem to improve at similar rates

CiteSeerX

Visualizing Working Sets

Author: Evangelos P. Markatos
Publication venue
Publication date: 01/01/1997
Field of study

Introduction It is widely known that most applications exhibit locality of reference. That is, applications access only a subset of their pages during any phase of their execution. This subset of pages is usually called the working set of the application. In this note we present the working set of applications in pictorial form so that it can be easily viewed and understood. Based on these working set "pictures' we make observations about the size, the duration, and the regularity of the working sets of various applications. Our applications cover several domains, ranging from numerical applications, program development tools, CAD simulations, and database applications. Our results suggest that most numerical and some database applications have regular access patterns and good locality of reference. Although most database and program development applications seem to have little locality of reference, careful observations at the appropriate granularity reveal r

CiteSeerX